{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c5623490-2c80-43ad-8d49-db90291e8b5c",
   "metadata": {},
   "source": [
    "#  08. 分子信息整合\n",
    "- data file: data/logP.csv\n",
    "- 目的：化合物名称、结构式、CAS号的互换,补全表格信息\n",
    "- 了解REST API的使用"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5622c567-81ee-4260-a93d-277f0be5c3b7",
   "metadata": {},
   "source": [
    "## NPClassfier\n",
    "\n",
    "![NPClassfier](img/NPClassfier.jpeg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6eee53ef-e875-4cd4-b096-a843e068326c",
   "metadata": {},
   "source": [
    "**基础知识**\n",
    "- url：https://npclassifier.ucsd.edu/classify?smiles=O=C(O)C(=O)CO\n",
    "- 返回值文本是JSON，什么是JSON\n",
    "- 连接状态码：\n",
    "    - 200：请求成功\n",
    "    - 400：请求的url中有语法错误，服务器不理解\n",
    "    - 404：请求的资源不存在\n",
    "    - 408：请求超时\n",
    "    - 500：服务器端遇到错误\n",
    "- python的模块requests和urllib\n",
    "- 数据获取方式分为：GET和POST"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "176b9518-4e6c-4416-a3a8-c96e0b4db8e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "import json\n",
    "import urllib\n",
    "from tqdm import tqdm\n",
    "\n",
    "def np_classfy(smiles):\n",
    "    smiles = urllib.parse.quote(smiles)  \n",
    "    url = f'https://npclassifier.gnps2.org/classify?smiles={smiles}'\n",
    "    response = requests.get(url)\n",
    "    if response.status_code == 200:\n",
    "        info = json.loads(response.text)     \n",
    "        family     = info['class_results'][0]      if len(info['class_results']) > 0      else ''\n",
    "        superclass = info['superclass_results'][0] if len(info['superclass_results']) > 0 else ''\n",
    "        pathway    = info['pathway_results'][0]    if len(info['pathway_results']) > 0    else ''\n",
    "    elif response.status_code == 500:\n",
    "        family = superclass = pathway  = 'unknown'\n",
    "    else:\n",
    "        family = superclass = pathway  = f'connection error: {response.status_code}'\n",
    "    return {'family': family,\n",
    "            'superclass': superclass,\n",
    "            'pathway': pathway}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "88a6ad88-92b9-47be-a3d9-2df136b87a90",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'family': 'Iridoids monoterpenoids',\n",
       " 'superclass': 'Monoterpenoids',\n",
       " 'pathway': 'Terpenoids'}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "smiles = 'CC1C(O)CC2C1C(OC1OC(COC(C)=O)C(O)C(O)C1O)OC=C2C(O)=O'\n",
    "np_classfy(smiles)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "0cf7f210-a495-45d2-8ce0-cbec5a96cd7d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv('data/LC-MS.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7306f66b-5f89-44af-b384-39cbac3773a4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(144, 17)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.drop_duplicates(subset='INCHIKEY')\n",
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "11819778-6525-40da-b0e2-c585f9723050",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|███████████████████████████████████████████████████| 144/144 [02:03<00:00,  1.16it/s]\n"
     ]
    }
   ],
   "source": [
    "## fill df\n",
    "\n",
    "for idx in tqdm(df.index, ncols=60):\n",
    "    smiles = df.loc[idx, 'SMILES']\n",
    "    info = np_classfy(smiles)\n",
    "    df.loc[idx, 'family'] = info['family']\n",
    "    df.loc[idx, 'sup[superclass'] = info['superclass']\n",
    "    df.loc[idx, 'pathway'] = info['pathway']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "d5f48c8b-99de-4ab5-8487-cb47f82434ef",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>SMILES</th>\n",
       "      <th>family</th>\n",
       "      <th>sup[superclass</th>\n",
       "      <th>pathway</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [SMILES, family, sup[superclass, pathway]\n",
       "Index: []"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "onto = df[['SMILES', 'family', 'sup[superclass', 'pathway']]\n",
    "onto[onto['family'].str.contains('unknown')]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39bd5dee-d474-496f-b018-a074bdf3fd1b",
   "metadata": {},
   "source": [
    "## pubchem REST API获取化学信息\n",
    "\n",
    "**检索方向**\n",
    "- 根据结构检索名称\n",
    "- 根据名称检索结构\n",
    "\n",
    "**REST API**\n",
    "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/inchikey/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43d136d8-2822-432e-b9ea-b24c2763e81d",
   "metadata": {},
   "source": [
    "### 根据结构检索名称\n",
    "- REST API规则\n",
    "- 思路：\n",
    "    - simils或其它结构式记录转换为inchikey，减少信息的模糊度\n",
    "    - inchikey嵌入到REST API的url中，执行联网检索\n",
    "    - 解析返回的数据，获得 upac_name, title。Title是常用名"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "205befcc-cb6f-4ad6-8065-10203b5f588b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "IUPAC Name: benzene\n",
      "Title: Benzene\n"
     ]
    }
   ],
   "source": [
    "import requests  \n",
    "\n",
    "def get_compound_name(inchikey):  \n",
    "    # Step 1: Use InChIKey to get CID  \n",
    "    url = f\"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/inchikey/{inchikey}/JSON\"  \n",
    "    response = requests.get(url)  \n",
    "    \n",
    "    if response.status_code == 200:  \n",
    "        data = response.json()  \n",
    "        try:  \n",
    "            cid = data['PC_Compounds'][0]['id']['id']['cid']  \n",
    "            # Step 2: Use CID to get compound name  \n",
    "            name_url = f\"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/property/IUPACName,Title/JSON\"  \n",
    "            name_response = requests.get(name_url)  \n",
    "            if name_response.status_code == 200:  \n",
    "                name_data = name_response.json()  \n",
    "                properties = name_data['PropertyTable']['Properties'][0]  \n",
    "                return properties.get(\"IUPACName\", \"No IUPAC Name Found\"), properties.get(\"Title\", \"No Title Found\") \n",
    "                # 注意这里字典数据的get用法，可指定键名不存在时的默认替换值\n",
    "            else:  \n",
    "                print(\"Failed to fetch compound name\")  \n",
    "        except KeyError:  \n",
    "            print(\"CID not found in the response!\")  \n",
    "    else:  \n",
    "        print(\"Failed to fetch CID\")  \n",
    "\n",
    "# Example usage  \n",
    "inchikey = \"UHOVQNZJYSORNB-UHFFFAOYSA-N\"  \n",
    "iupac_name, title = get_compound_name(inchikey)  \n",
    "print(f\"IUPAC Name: {iupac_name}\")  \n",
    "print(f\"Title: {title}\")  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c01d1bad-971d-4c97-9f64-2f38a342d1e4",
   "metadata": {},
   "source": [
    "#### 根据名称获得化合物结构信息（smiles）\n",
    "- 一个化合物有很对名称，有的化合物有几十个名称\n",
    "- pubchem API不支持从化合物名称直接获得结构数据\n",
    "- 但pubchem API支持从名称检索CID（pubchem的compound id号）\n",
    "- pubchem API也支持从ID号检索结构信息（如smiles）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "aac6b4c8-db36-4fe1-b763-b3130b7a077a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Compound Name: Aspirin\n",
      "Canonical SMILES: CC(=O)OC1=CC=CC=C1C(=O)O\n"
     ]
    }
   ],
   "source": [
    "import requests  \n",
    "\n",
    "def get_smiles_from_name(compound_name):  \n",
    "    \"\"\"  \n",
    "    通过化合物名称获取结构式的 SMILES 字符串 (CanonicalSMILES)  \n",
    "    \n",
    "    Args:  \n",
    "        compound_name (str): 化合物名称，例如 \"Aspirin\"  \n",
    "    \n",
    "    Returns:  \n",
    "        str: SMILES 字符串，如果失败返回错误信息  \n",
    "    \"\"\"  \n",
    "    try:  \n",
    "        # Step 1: 获取 CID  \n",
    "        cid_url = f\"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/{compound_name}/cids/JSON\"  \n",
    "        response = requests.get(cid_url)  \n",
    "        if response.status_code == 200:  \n",
    "            data = response.json()  \n",
    "            if \"IdentifierList\" in data and \"CID\" in data[\"IdentifierList\"]:  \n",
    "                cid = data[\"IdentifierList\"][\"CID\"][0]  \n",
    "            else:  \n",
    "                return \"No CID found for the given compound name.\"  \n",
    "        else:  \n",
    "            return f\"Failed to fetch CID. Status code: {response.status_code}\"  \n",
    "        \n",
    "        # Step 2: 获取 SMILES  \n",
    "        smiles_url = f\"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/{cid}/property/CanonicalSMILES/JSON\"  \n",
    "        smiles_response = requests.get(smiles_url)  \n",
    "        if smiles_response.status_code == 200:  \n",
    "            smiles_data = smiles_response.json()  \n",
    "            if \"PropertyTable\" in smiles_data and \"Properties\" in smiles_data[\"PropertyTable\"]:  \n",
    "                smiles = smiles_data[\"PropertyTable\"][\"Properties\"][0].get(\"CanonicalSMILES\", None)  \n",
    "                if smiles:  \n",
    "                    return smiles  \n",
    "                else:  \n",
    "                    return \"No SMILES found for the given CID.\"  \n",
    "            else:  \n",
    "                return \"Failed to parse SMILES response.\"  \n",
    "        else:  \n",
    "            return f\"Failed to fetch SMILES. Status code: {smiles_response.status_code}\"  \n",
    "    \n",
    "    except Exception as e:  \n",
    "        return f\"An error occurred: {e}\"  \n",
    "\n",
    "\n",
    "# 示例  \n",
    "compound_name = \"Aspirin\"  \n",
    "smiles = get_smiles_from_name(compound_name)  \n",
    "print(f\"Compound Name: {compound_name}\\nCanonical SMILES: {smiles}\")  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4464bfee-faab-4d40-bf4a-080d27ac37c5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcIAAACWCAIAAADCEh9HAAAABmJLR0QA/wD/AP+gvaeTAAAeWUlEQVR4nO3deVhU9f4H8PcMywiCLKK4s7hgCL8UBURBFNMAIQWatNRupmY9t7C6ld1rj8bNyrZH+OUv8xqkttrIICgiiGiCQoLLDcEFAQGRRVlk32bO749DIyIqAjNnzpnP66/mM2fmfOZ58N053/M93yNiGAaEEEL6Ssx1A4QQwm8Uo4QQ0i8Uo4QQ0i8Uo4QQ0i8Uo4QQ0i8Uo0Tg6urqIiMjw8LC6urquO6FCJOIJjwRAbt586azs3N1dTUAfX19X19fqVQaHBxsYmLCdWtEOChGiWCVlZX5+vr++eefYrHY0NCwpaWFrRsbG/v7+wcHBy9atGjIkCHcNkkEgGKUCFNBQcHTTz997do1W1vb3bt3e3t7X79+PTY2ViaTnT59mv2zl0gkXl5eAQEBy5Yts7a25rplwlcUo0SAsrKyFi1aVFlZ6erqGh8fP2zYsK7vlpSUJCQkHDx48MiRIx0dHQD09PRmzpwplUqlUumoUaM46prwFcUoEZpjx44FBQXV19c/9dRTcrnc1NT0QVtWVVXFx8fLZLKkpKS2tjYAYrHYw8MjMDAwJCRkwoQJGuya8BjFKBGUH3/88eWXX25vb1+5cmVkZKSBgYHqrfPnzzs6Okokkvs/VVNTc/DgwUOHDh0+fLixsZEtOjo6SqXSZcuWTZ48WUPdE36iGCXCERER8fbbbyuVytDQ0G3btonFd+fztba2Dh8+vKOjw8fHRyqVBgUF9XiU2tTUdOzYMZlMduDAgfr6erbI5mlgYOD06dM19EsIvzCE8J9SqXzvvfcAiESizz///P4Nrl27Nm3aNNWfvbGxcXBw8E8//XTnzp0ev7C5uTkuLm7lypXm5uaqT9nZ2YWGhqampiqVSjX/IMIndDRKeK+jo2PdunVRUVGGhoa7d+9+/vnnH7RlUVHRgQMHul6sV11cetDFeoVCkZ6eLpPJfvvtt/LycrY4bty4JUuWSKXSWbNmdT3mJbqJYpTwW2Njo1QqTUhIMDExkclkvr6+vfnUrVu3EhISZDJZ7y/Wq/I0Ojq6tLSULVpZWfn5+UmlUl9f367jsESnUIwSHquqqgoMDExPT7e2tj58+LCLi0sfvuH+i/XTpk0LCAhYvnz5xIkT7/+IUqlMT0+Xy+Vyufz69ets0cjIaNy4cVFRUbNmzerfbyL8QzFK+KqwsNDX1/fq1at2dnaJiYk9Rl7v1dTUJCcnHzx4UC6X9/5ifU5Ojkwm27dv3+XLlwEMGTKkpqaGTvN1DcUo4aXs7Gw/P7/S0tLp06fHx8cP4D1Izc3NycnJMpksNjZWtZqJo6NjYGBgQECAp6dnj586cOBASEiIUqlMSkpasGDBQDVDeIFilPDP8ePHg4KC7ty54+PjExMTo6b74ltaWo4ePXro0KGYmJhbt26xRTs7u8DAQKlUOnv2bJFI1HV7V1fXrKys7777bvXq1eroh2gvDmcJENIHcrl80KBBAEJCQpqbmzWwx9bW1oSEhDVr1nS9qdTGxuatt97Kzs5WbbZ161YAr7/+ugZaIlqFYpTwyddff82OPIaGhioUCg3vXaFQpKamhoaGjh49mg3TyMhI1bspKSkA3NzcNNwV4Ryd1BN+YBgmLCwsLCxMJBJt2rTpww8/5LAZpVKZkZEhl8vff/99KysrtlhfX29hYaGnp1dXV9fjLadEqChGCQ8oFIrXXntt165d+vr6O3bsWLNmDdcd9czZ2fnixYsZGRnu7u5c90I0h2ZmEG3X1NS0ePHiXbt2DR48ODY2VmszFACbnn/88QfXjRCNohglWq26unrhwoXx8fGWlpZJSUn+/v5cd/Qwbm5uAM6cOcN1I0Sj9LlugJAHKioq8vX1vXz5sq2t7ZEjRxwcHLju6BHoaFQ30dgo0VI5OTl+fn4lJSVOTk4JCQljxozhuqNHUygU5ubmjY2NlZWVqktPRPDopJ5oo9jYWG9v75KSEm9v77S0NF5kKAA9PT0XFxeGYTIzM7nuhWgOxSjROh988MGSJUuqqqqeffbZxMREMzMzrjt6DDQ8qoNobJRonW+++QaAs7Pzvn37eLfMBw2P6iCe/Y0SXdDQ0ABg48aNvMtQdIlRuuqgO/j3Z0oEjz0vLisrY18qFIrvv//+zTff5LSp3ho7duyoUaOqq6vz8/O57oVoCMUo0TpLly4FkJ2dzb7U09PbuHFjREREQUEBp331lqurK2h4VJdQjBKtw54Xd40hNpj4MuBIw6O6hmKUaJ2pU6dKJJLc3FzVqsn8uvzNdksxqjsoRonWMTQ0nDp1qlKpPHv2LFvh1/Gdm5ubWCy+cOFCa2sr170QTaAYJdqo2wGdq6urWCw+f/48+9Q5LWdqajp58uTW1tY///yT616IJlCMClRbG2Jj8ckn+PhjxMSgpYXrhh5Pt+FRMzMzBweHlpYWvgQTvw6fST9RjApRbi6eeAIvvYQTJ3DyJNasgYMDLlzguq3HwB6NZmRkdKvwa3iUL92SfqIYFZzWVixejLFjUVSEpCQkJuL6ddjbY/FiNDVx3VxvTZgwYejQoWVlZTdu3GAr/Dq+41e3pJ8oRgUnNhb5+di1C6rnZZqa4uuvceMGZDJOO3sMIpGo2yQnfgWTs7Pz4MGD8/LyqqqquO6FqB3FqOCcPo1hwzBx4j1FJyeMHo3TpznqqS+6DY86OzsbGRldvXq1pqaG0756RV9ff9q0aQzDZGVlcd0LUTuKUcG5fRsjR/ZQHzMGt29rvJu+63ax3sDAgF/BxK/DZ9IfFKOCY2yMO3d6qNfUwNhY4930nbu7u0gkysrK6ujoUFXAn2Ciq0y6g2JUcBwdUVqKxsZ7ivX1KCqCkxNHPfXF0KFD7e3tGxsbL126xFb4FUxs6GdkZNBST4JHMSo4ISEAsH37PcWvv0Z7O557jpOO+qzbeT2/jkZtbGysra2rqqoKCwu57oWoF8Wo4Iwdi61bsXEjQkNx6BDi4/HWW9i0CVu2wM6O6+YeT7fctLOzs7a2rqysvH79Opdt9RrdXK8jKEaF6O23ERODy5exdi3WrEF2Nvbvx4YNXLf12ISx1BNfRiFIn1GMClRgIJKSUFaGsjIkJ2PJEiQkYOPGnq8+aatp06ZJJJKcnBx2PXzwc3iUL6FP+oxiVKDa2pCail9/vVv56CN88gl49cRKiUTi7OysUCh4sdSTUql855130tLSVBV2RZVz58418ef+MdIHFKMCdecO5szB2rVQKDorbm4AoJUB9BDdcpNdg+7s2bPt7e2c9tVda2vrCy+88NVXX0mlUlVompmZGRkZtba2RkdHc9seUSuKUYEaNgx2dmhoQG5uZ8XdHQB4cjqs0u0s3tzcfOLEiS0tLapHjGiDhoaGxYsX79u3z8TEZM+ePcZ/zc+tqalhJ73m5eVx2iBRL4pR4eqWm+zLLmsm8cL9Z/HaNjxaUVHh7e2dmJg4YsSIkydPLly4kK1XV1d7enq2trbq6emtWrWK2yaJWlGMChebm6oAsrfH8OGorERREYdNPa5JkyZZWFjcuHHj5s2bbEWrhkcLCgq8vLzOnTtnb2+fmpo6bdo0tn7z5s158+bl5uZaWVnFxcXZ8W2qGXksFKPCxQ6Gdj1qmzED4NnwqGqpJ9Xhp/ZMxszKyvLw8MjLy5sxY0Z6evqECRPYem5u7syZM//8888pU6acP3/e39+f2z6JulGMCpeLCwwNcfEi/potxOvhUVVuTp061cjI6PLly7W1tRx2lZKSMn/+/MrKyvnz56ekpAwfPpytZ2RkeHt7l5SUzJw58/fffx8zZgyHTRLNoBgVrkGD4OwMhQLnznVWup3m80S3SewGBgZPPvkkwzCqWVCa99NPP/n5+dXV1a1YsSIhIcHU1JStx8XF+fj43L59e/HixSkpKUOHDuWqQ6JJFKOC1i03XV0hEuHsWWjZbKGHY2M0MzNT8dfkLW6HRyMiIl588cW2trbQ0NA9e/YYGBiw9d27d4eEhDQ3N69atWr//v1GRkactEc0j2JU0LoNj1paYsIENDcjJ4fDph7XsGHDbG1t6+vrL1++zFa4uljPMMyHH3745ptvMgzz2WefRUREiMWd/4I+++yzVatWdXR0bNiwISoqSl9fX8O9ES4xRMAuXWIAZuzYu5UVKxiA+fZb7nrqi6VLlwKIjIxkX167dg3AiBEjNNlDe3v76tWrARgaGv7000+qekdHx2uvvQZAT09vx44dmmyJaAk6GhU0BwdYWKCkBH/NFhLG8Ki9vb2/v//y5cs19tj6xsbGZ555JjIycvDgwbGxsS+88AJbZ29e2rFjh0Qi+fXXX1999VXN9EO0C9c5TtRswQIGYGJiOl/+8QcDMFOmcNrTY2NvVJ86dSone6+qqvLw8AAwdOjQ9PR0Vb2mpmbOnDkAzM3NT548yUlvRBvQ0ajQdZvkNHUqJBJcuoS6Og6belwuLi4GBgbZ2dmN3Vb1V7/r16/PmjUrPT3dzs7u9OnTM2fOZOtlZWXz5s07efLkyJEjT5w44eXlpeHGiPagGBW6biuSGBo2LlxY4OFx7cIFDpt6XEZGRuxSTxp+nt3Fixc9PT2vXLni7Oycmpo6adIktp6fn+/l5XXhwoUnnngiIyPjySef1GRXRNtQjAqdm5ty6NCbCoVSqWQL/7KzG3/qlOzUKW77eizt7e0tLS3Dhw9/5plnXnzxxYMHD2pgVPTEiROenp6lpaXz5s1LS0sbPXo0W8/MzPTw8MjPz3dzczt58uS4cePU3QnRdlyPKhC1s7W1BXDx4kX25Y8//ghgyZIl3HbVe/X19b6+vgCMuzzZ1MLC4m9/+1tsbGxzc7M6diqXywcNGgQgODi46y6SkpLYyfYBAQGNjY3q2DXhHYpR4WNnC0VFRbEv2UXbRo4cyW1XvVRVVTVr1iwA1tbWWVlZFy9e3Lx58/Tp01V5amRkFBAQsGfPnjt37gzUTrdv385OCH3jjTcUCoWqvnfvXnayPTv9fqB2R/iOYlT4vvrqKwDr1q1jXyqVSisrKwDFxcXcNvZIhYWFDg4OAOzs7K5evdr1rYKCgvDw8NmzZ4tEIjZPBw0aFBAQsHPnzsrKyj7vUalUbt68GYBIJNq8eXPXt8LDw9l9hYaGKpXKPu+CCA/FqPCxs4WmTZumqvj5+QGQyWQcdvVI2dnZ7HCks7NzaWnpgzYrKipi81R1Q5Gent7s2bPDw8PLysoea48dHR1r165lv+E///mPqs4+HYTN1i+//LLvP4kIFMWo8DU1NRkYGOjr6zc0NLAV9oDr3Xff5baxhzh+/LiZmRkAHx+fXp6t37p1a8+ePQEBAaqb3MViMZunN27c6M03NDQ0uLu7Dx48OD4+XlVsbW1dtmwZAENDw19++aWPv4cIGsWoTmCXE05NTWVfHj58GMCcOXO47epBVJd32JU+HvfjVVVVbJ5KJBLVEKqjo+PmzZuvXLny8M/eunXrzJkzqpf19fVPP/00ABMTk8TExMf+JUQ3UIzqBPYmRdUJaVVVlUgkMjY2bm9v57ax+6ku74SGhna9vNMHjY2NcXFxK1euNDEx6ZanOTk5j/x4eXm5i4sLgBEjRpw7d64/nRBhoxjVCVFRUQCee+45VYVdqv3ChQscdtXNQy7v9FNTUxObp0OGDFHlqb29fWhoaGpqao/XiwoKCiZOnMhulpeXN4DNEOGhGNUJOTk5AGxsbFQVdnGNrhdSONbe/vO//gXAwMBgz549atpJc3Pz0aNHQ0NDVYvVA7C1te2Wp1lZWewGM2bMqKioUFMzRDAoRnWCUqk0NzcHUF5ezlYiIiIArF69mtvGOjU2MgEBCrF41dy5hw8f1sAO29raEhMTX3nlla55Onbs2PXr12/bto09aJ0/f35dXZ0GmiF8RzGqK+bPnw8gLi6OfZmRkQHAycmJ264YhmGqq5nZsxmAsbRkTp3S8M4VCkVqauqGDRvGjx/PhimboStXrqQJ9qSX6J56XdH1wXC1tbVOTk4SiSQ3N7eO26Webt6EtzdOnYKNDU6dwqxZGt6/WCz29PTcunVrXl5eRkbG2rVr6+vrLS0tuz4dhJCHoxjVFezKxxkZGS4uLpaWliNGjDAxMVEqlZp/FMddOTmYORPZ2ZgyBWlpmDyZs04AkUjk7u6+c+dOc3Pz6urqsrIyDpsh/EIxqivYGP3999/Pnz/PMExDQ0NVVRWAkJCQ1atXx8fHt7a2arShjAx4e6OkpPNoVDseRCwSiWbMmAEuHvRE+ItiVFeYmJgMGjSoo6NDIpF88cUX7Bofjo6OdXV1UVFRAQEBlpaWgYGBe/fura+vV3s3sbHw8UFVFZYsQUICzMzUvsde4/axo4SXuB6cJZpQVlbG3sg0YsQI9mhUJT8/v9saH6o1k2pra9XSTVQUo6/PAMzLLzPaN/8/Li4OgI+PD9eNEN6gGBW+/Px8drL9+PHjHzKT/Nq1a5999pm7u7sqTyUSybKgICYqirl9e8C62bqVARiA2bBhwL5zQFVUVAAwNTXt6OjguhfCDxSjApeZmclOjXR1de3lCnIlJSU7d+4MCAjQ19f/yM2NARg9PWb2bCY8nHnwSku98uqrnd+2a1e/vkfNui10TcjDUYwKWXJyMjsL8qmnnurDTPKKiorivXuZhQsZA4POQ0ixmPHyYrZtY4qKHvHh8nLmzBnmv/9lWlvvFrduZSQSRrsX6GP+Wug6MjKS60YIP1CMCtYPP/zAznwcgJnk1dXMnj2MVMoMHtyZpwDj6Mhs3sxcutR94/x8Zt48BmCMjBiRiBkyhNmyhVHdt56f369ONKLbQteEPBzFqDCFh4cP1DpJ92hsZOLimJUrGVPTe/J0wwaGXYWvupoZN47x9GSysxmFgqmrYyIjGYmE2bRpwHpQP3ah66lTp3LdCOEHilGhUSqVGzZsACASiT7//HN17aapiZHLmRUrGHPzzjDV02MqKpgtWxiJhCkpuWfjd99ljIyYqip1NTPQ2IWu9fT0VAtdE/IQIoZhNDm/iqhVR0fHunXroqKiDA0Nd+/e/fzzz6t9l21tOHYM0dGoqUF0NObMgYkJDh++Z5vcXEyZgpgYLFmi9n4GyPTp08+dO3fy5EkvLy+ueyHajqbfC0djY+MzzzwTFRVlYmISGxuriQwFYGgIPz989x2iowHgxg3c/9x2OzuIRCgp0UQ/A4RdgoDuZSK9QTEqEFVVVQsWLEhISLC2tj5x4gT7YHcOiERQKLoXFQowDPT0uGioj+heJtJ7+lw3QAZAYWGhr6/v1atX7ezsEhMT2WXbuWFri6Ki7sXCws63+INilPQeHY3yXnZ2tpeX19WrV52dndPS0rjMUAB+fjh+vDM3Vfbtw5Ah4NUg4+TJk83NzYuLi2mpJ/JIFKP8dvz4cS8vr9LSUh8fn7S0tFGjRnHc0Lp1sLXF0qU4exYdHaitxTff4IsvsGkTTE057u1xiESi6dOnA8jMzOS6F6LtKEZ5LCYmxt/f/86dOyEhIfHx8V2f18YZU1McP45hw+DmBhMTWFhg0yZ88QX+8Q+uO3tsdF5PeonGRvlq+/bt69evVyqVoaGh27ZtYyfba4UxYxAfj+pqXL8OIyNMmsSvi0sqFKOkl2jeKP8wDBMWFhYWFiYSiTZt2vThhx9y3ZEwVVZWWltbm5qa1tbWatH/pYj2oRjlGYVC8dprr+3atUtfX3/Hjh1r1qzhuiMhs7GxKS4uzs3NfeKJJ7juhWgv+n8snzQ1NS1evHjXrl2DBw+OjY2lDFU3Oq8nvUExyhvV1dULFy6Mj4+3tLRMSkry9/fnuiPh6/o4VUIehC4x8UNRUZGvr+/ly5dtbW2PHDni4ODAdUc6gT0apVtCycPR2CgP5OTk+Pn5lZSUODk5JSQkjNGOh2jqgqamJjMzM5FIVFtba2xszHU7REvRST0P7N27t6SkZO7cuWlpaZShmmRsbOzk5NTe3n7+/HmueyHai2KUBz799NPw8PAjR46YadODiHUEXWUij0QxygNisXj9+vUSiYTrRnQRrZhHHolilJCHoaNR8kh0iYmQh1EqlRYWFnV1deXl5dbW1ly3Q7QRHY0S8jBisXjGjBmg83ryYBSjhDwCOzy6fv36b7/9try8nOt2iNahk3pCHuHAgQNBQUHsf4vFYg8Pj8DAwJCQkAkTJnDbGNESFKOEPEJxcbGNjY1YLA4ODk5ISGhsbGTrjo6OUql02bJlkydP5rZDwi2KUUIeQalUGhgYKJXK5ORkDw+PY8eOyWSyAwcO1NfXsxs4OjoGBgYGBAR4enpy2yrhBMUoIQ/T2tq6YsWK/fv3GxgY3L59W/WIgebm5iNHjsjl8kOHDtXW1rJFBweH4ODgpc8++6SLC3ctE02jGCXkgRoaGoKDg48ePWpsbLxz584VK1bcv41CoUhPT5fJZL/99ht7AerrWbNev3EDS5YgMBBz50KfFgASOIpRQnpWXl7u7++vupteT09v5syZUqlUKpX2+OhAhULx+++/y+Xy9/PyxiQldVZHjkRQEIKDMXcuTx+mQh6JYpSQHuTn5/v6+l67ds3BwWHVqlXHjx9PSUlpb2/HXxfrQ0JCgoODbWxsev58Tg5kMvzyC65e7axYWmLRIkilWLgQdF+vsFCMEtJdZmbmokWLbt265ebmFh8fb2VlBaC2tvbo0aMHDx6Uy+XdLtYvXbr0gU8ZYfNUJkNubmfF2Bg+PpBKERwMExNN/B6iZhSjhNwjOTk5ODi4vr4+ICBg37599y8z2tzcnJycLJPJYmNj6+rq2OKjL9bn5CA6GnI5/vvfzoqxMfz98fLL8PNT148hmsEQQv7yww8/GBgYAHjxxRfb2toevnFTU9OBAwdWrlxpbm7O/msaIpF0TJ3KvP8+k5nJKJU9f6ywkAkPZ2bPZkQiBmA2buwsvvQSM2oUY2jIjB7NvPwyU1Q00D+OqAsdjRLSKSIi4u2331YqlaGhoeHh4SKRqJcfbGtrS0lJkcvlDgUF/zh2rLNqY4PgYAQHY9Ys9Ph85pISyOVYuBDGxnB3h7091q/H+PG4dg1ffonSUpw5g7FjB+jHETWiGCUEUCq3btz4z61bxWJxRETE66+/3ufvwenTkMkQHY3S0s6ilRX8/CCVwtcXBgY9fGrpUly4gAsXYGTUWamvx8SJ8PHBzz/3sROiQRSjROe1teGll5ouXHCsqNj6f/+3bNmygfnaXl6sb2uDmRk2bcI//3nPxzdtwldfoba25+Ql2oRilOi2+noEByM5GUOGNMXFGXt7D/wuzp2DXI7oaFy+3FkxM8OiRQgJweLFKCjApEn47TdIpfd86uefsXw5CgpgZzfwLZEBRQvlER1WUYG5c5GcjBEjcOKEWjIUgIsLtmzBpUvIz0d4OGbPRl0dfv4Zf/87RCJ0dAC4ezqvws6FamtTS0tkQNFtakRXFRbi6aeRlwd7exw5gokT1b5H9iLS+vXIz0d0NAwMIBbD2hoi0d2BVJXiYohEGDFC7V2RfqOTeqKTzp7FokWoqMCMGYiPx/DhXDbj7AxHR+zbd08xKAhFRTh3jqOeyGOgk3qie1JS4OODigrMn49jxzjOUADvv4/9+/Hjj3cr33+P2NjuF52ItqKjUaJjoqOxYgVaWrB8Ob7/Xluug3/8Mf79b4wejYkTcfUqbt7ERx/hvfe4bov0CsUo0SX/+7946y0olQgNxbZtPc+K58qNGzh+HGVlGDUKPj7oaREpop0oRoluYBiEhSEsDCIRPv0UGzZw3RARDopRohtqauDqiuJifP89li/nuhsiKNp0UkPIw128iEmTcOXKPcUPPkDX+44qKvDGG5gwAZaWsLXF2rUoLgYACwskJiI+njKUDDiaN0r4o6UFeXlobb2nWF7eGZQAKirg7o4hQ7BxIyZNQmEhIiLg6or0dNjbY/x4jB+v+a6J4FGMEgH54APU1+P8eVhYAMDs2QgKgoMD3nkHcjnXzRHBopN6IhRKJfbvx+rVnRnKGjwYf/87Dh1CczN3nRGBo6NRwjfbt98zYf7s2c4b0isrUVsLB4fu2zs6or0dBQWYMkVzTRJdQjFK+Ka8HC0td182NHTGKLuKh6lp9+3ZSrcRVUIGDsUo4ZstW/A//3P35Zo1nU+Ls7KCnh5u3Oi+PXsBitb4IGpDY6NEKIyN4eKC+Pju9UOHMGEC3RRE1IdilAjIe+8hJQW7d9+txMQgJoZuTidqRSf1RECefRYff4xXXsEnn8DBAQUFuHIF772HNWu47owIGd0MSvijpgZJSfD1hZnZ3WJmJurqMH/+3UpxMY4exe3bMDfHggWwt9d8p0SnUIwSQki/0NgoIYT0C8UoIYT0C8UoIYT0C8UoIYT0C8UoIYT0C8UoIYT0y/8DFtwnJTwQcuQAAAEaelRYdHJka2l0UEtMIHJka2l0IDIwMjQuMDkuNQAAeJx7v2/tPQYgEABiJgYI4IXiBkY2hgQgzcjMxqABpJlZOCA0E4xmc4CIszlkgGhmRrwMqFoMs8AKGBm5GRgZGJk0mBiZFZhZFFhYM5hY2RLY2DOY2DkSODgzmDi5FLi4NZi4eBQ4WRKcGIGa2Fg4OdjZWMWXgRzJAHN93p/iAw+2LtkP4rRv1zlwxHfCPhA74YTDgfvrptuD2BMOL9q/OM0OrEbknOK+sj2tdiD2+dyP+3IN74LVbI5RsX/3Us8BxH6Tw+LwL1IQzD5SK+EQ8dgXrGZP4SJ7yTZfsPm5k0/ZCxdfBZtZNt/BocqT4wCI3aD2we4glzaYLQYArblDU80uj9gAAAF0elRYdE1PTCByZGtpdCAyMDI0LjA5LjUAAHicfZNRbsMgDIbfcwpfoMg2BpvHtqmmaWoibd3usPfdX7NTdaQSGsQRkA8bfjsTRHuf375/4K/xPE0A+M/TWoOvjIjTFWIAp8vL6wLn2/H0WDmvn8vtAyjHg9Gf2eNtvT5WCM5wyMmI2RgOlNiqoW9JuLW+l4PkVJspOolOkkkekBnW8InuMyIksoIsA1AC9JBKpeVwqZK1jMgSwTFRMYw7Ja7EOIpd76C4SyGPrWyoNADVQUxV3E0DTpozsg04c44T1kZiPvCzEo24tnHZuGr1z4bCZSQjeX78XFwbtrrJiKioI5I2srhLV891qlWLjtQhdiFdcGx+jUgScRMrIzJy4/J4dJEtncolj/S5LPNTodxL57Qucy+d6NzrI3ruReDfQHqqya30fPoEas+auGnPjbhZT4G4ta60uNFeUdpetFOO40W8U4i2lbxTQmJX2V94f72YP/4qH0+/MSuvwpRGWqEAAAC9elRYdFNNSUxFUyByZGtpdCAyMDI0LjA5LjUAAHicJY7LDQJRCEVbcakJQ/jDy8TVFGARtmHx3qfsOBzgXtf9+Xq83vrepf/u9rkfzqNmQ4eyTQ2dh3GtaToEQCccyFkgOSnrpBgI7NZc2+rwzgAT1hxRJ2yWmviPBbxQrLaNNJ3CFZgtMm73LRlLLU0A3FTbwMeQRXgkLOnENxi/SCK9rygnFNuxqzpBkFFWOyG/2orZn6KWRGxUben0+HwBo5c17J+vY1QAAAAASUVORK5CYII=",
      "text/plain": [
       "<rdkit.Chem.rdchem.Mol at 0x1e23c4bf5a0>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from rdkit import Chem\n",
    "\n",
    "Chem.MolFromSmiles(smiles)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
